Inventi Impact: Audio, Speech & Music Processing

Current Issue : July-September
Volume : 2025
Issue Number : 3
Articles : 5 Articles

Articles

Inventi:easm/102509/25

Application of Real-Time Stereo Three-Dimensional Sound and Speech Processing in Speech Therapy: A Case Report

Mitsiou Nikolaos, Mevouliotis Ioannis, Vaitsis Nikolaos

>Research Download Full Text

Background: Stereo width is grabbing attention in speech therapy, seen as a neat way to boost communication and rehab outcomes. Speech pathologists, for instance, are mixing fresh sound techniques with their usual methods in ways that feel less rigid and more experimental. New tech, like electroencephalogram neural interfaces, now delivers live brain-activity data during speaking tasks –a sign of how modern tools are reshaping the field. These devices even let practitioners’ peek into the physical flow of speech, which is handy when working with folks who have Parkinson’s disease, where hypokinetic dysarthria and fuzzy speech are common. By weaving stereo width into treatments, therapists end up crafting a more layered and engaging auditory scene that might help speech come out clearer. All in all, this intro kind of sets us up to explore not just how stereo width helps spot speech issues, but also how it might make therapy more effective. Methods: This report presented a boy, aged 7 years, with speech delay in the S and Z sounds accompanied by divided attention. Through speech therapy sessions with application of Real-Time Stereo Three-Dimensional Sound, there was a significantly faster improvement in both the automation of the S and Z sounds and a faster response to the spoken commands given. Conclusions: This first report of application of Real-Time Stereo Three-Dimensional Sound and Speech Processing in Speech Therapy in a patient underscores the value of these techniques, suggesting that their role in shaping more effective speech therapy interventions is just taking off and further case accumulation is needed to clarify their practical value....
Read More

Inventi:easm/102511/25

Atypical Audio-Visual Neural Synchrony and Speech Processing in Early Autism

Xiaoyue Wang, Sophie Bouton, Nada Kojovic, Anne-Lise Giraud, Marie Schaer

>Research Download Full Text

Background Children with Autism Spectrum disorder (ASD) often exhibit communication difficulties that may stem from basic auditory temporal integration impairment but also be aggravated by an audio-visual integration deficit, resulting in a lack of interest in face-to-face communication. This study addresses whether speech processing anomalies in young autistic children (mean age 3.09-year-old) are associated with alterations of audio-visual temporal integration. Methods We used high-density electroencephalography (HD-EEG) and eye tracking to record brain activity and gaze patterns in 31 children with ASD (6 females) and 33 typically developing (TD) children (11 females), while they watched cartoon videos. Neural responses to temporal audio-visual stimuli were analyzed using Temporal Response Functions model and phase analyses for audiovisual temporal coordination. Results The reconstructability of speech signals from auditory responses was reduced in children with ASD compared to TD, but despite more restricted gaze patterns in ASD it was similar for visual responses in both groups. Speech reception was most strongly affected when visual speech information was also present, an interference that was not seen in TD children. These differences were associated with a broader phase angle distribution (exceeding pi/2) in the EEG theta range in children with ASD, signaling reduced reliability of audio-visual temporal alignment. Conclusion These findings show that speech processing anomalies in ASD do not stand alone and that they are associated already at a very early development stage with audio-visual imbalance with poor auditory response encoding and disrupted audio-visual temporal coordination....
Read More

Inventi:easm/102510/25

Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks

Wided Bouchelligua, Reham Al-Dayil, Areej Algaith

>Research Download Full Text

This paper investigates the effectiveness of various data augmentation techniques for enhancing Arabic speech emotion recognition (SER) using convolutional neural networks (CNNs). Utilizing the Saudi Dialect and BAVED datasets, we address the challenges of limited and imbalanced data commonly found in Arabic SER. To improve model performance, we apply augmentation techniques such as noise addition, time shifting, increasing volume, and reducing volume. Additionally, we examine the optimal number of augmentations required to achieve the best results. Our experiments reveal that these augmentations significantly enhance the CNN’s ability to recognize emotions, with certain techniques proving more effective than others. Furthermore, the number of augmentations plays a critical role in balancing model accuracy. The Saudi Dialect dataset achieved its best results with two augmentations (increasing volume and decreasing volume), reaching an accuracy of 96.81%. Similarly, the BAVED dataset demonstrated optimal performance with a combination of three augmentations (noise addition, increasing volume, and reducing volume), achieving an accuracy of 92.60%. These findings indicate that carefully selected augmentation strategies can greatly improve the performance of CNN-based SER systems, particularly in the context of Arabic speech. This research underscores the importance of tailored augmentation techniques to enhance SER performance and sets a foundation for future advancements in this field....
Read More

Inventi:easm/102513/25

Games with a Purpose for Part-of-Speech Tagging and the Impact of the Applied Game Design Elements on Player Enjoyment and Games with a Purpose Preference

Rosa Lilia Segundo Díaz, Gustavo Rovelo Ruiz, Miriam Bouzouita, Véronique Hoste, Karin Coninx

>Research Download Full Text

Linguistic tasks such as Part-of-Speech (PoS) tagging can be tedious, but are crucial for the development of Natural Language Processing (NLP) tools. Games With A Purpose (GWAPs) aim to reduce the monotony of the task for native speakers and non-experts who contribute to crowdsourcing projects. This study focuses on revising and correcting PoS tags in the Corpus Oral y Sonoro del Español Rural (COSER), the largest collection of oral data in the Spanish-speaking world, to create a parsed corpus of European Spanish dialects. It also examines how game design elements (GDEs) affect players’ enjoyment. Three games—Agentes, Tesoros, and Anotatlón—were developed, incorporating different GDEs, such as rewards and challenges. The results show two levels of enjoyment: at the concept level with Anotatlón, and at the level of individual GDEs with Tesoros. This suggests that certain GDEs influence player enjoyment and, consequently, their preference for certain games. However, the study also shows the complexity of evaluating triggers for player enjoyment in games with more than one implemented GDE....
Read More

Inventi:easm/102512/25

Nonlinear Audio Processing When Creating Excursion Material Using AI Content

Alexander Blinnikov, Ivan Blinnikov, Erkin Kadirov, Nikolay Tsybov

>Research Download Full Text

This article examines the process of creating audio guides for the "Museum of Our Childhood" which is a unique interactive project dedicated to the engineering glory of Yenisei Siberia. Special attention is given to the stages of recording voice-overs, audio editing, and mastering of three audio tours. The paper describes approaches to non-linear processing of audio material, selection of equipment and software, as well as the specifics of integrating audio tours with museum exhibits. The importance of the audio format for preserving historical memory, engaging youth, and popularizing engineering achievements through the digitalization of cultural heritage is emphasized....
Read More

Call Us: +4 (800) 888-0008

Inventi Impact: Audio, Speech & Music Processing

Articles

Inventi:easm/102509/25

Application of Real-Time Stereo Three-Dimensional Sound and Speech Processing in Speech Therapy: A Case Report

Inventi:easm/102511/25

Atypical Audio-Visual Neural Synchrony and Speech Processing in Early Autism

Inventi:easm/102510/25

Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks

Inventi:easm/102513/25

Games with a Purpose for Part-of-Speech Tagging and the Impact of the Applied Game Design Elements on Player Enjoyment and Games with a Purpose Preference

Inventi:easm/102512/25

Nonlinear Audio Processing When Creating Excursion Material Using AI Content

Links

Contact Us